653 research outputs found

    Distributed BLAST in a grid computing context

    Get PDF
    The Basic Local Alignment Search Tool (BLAST) is one of the best known sequence comparison programs available in bioinformatics. It is used to compare query sequences to a set of target sequences, with the intention of finding similar sequences in the target set. Here, we present a distributed BLAST service which operates over a set of heterogeneous Grid resources and is made available through a Globus toolkit v.3 Grid service. This work has been carried out in the context of the BRIDGES project, a UK e-Science project aimed at providing a Grid based environment for biomedical research. Input consisting of multiple query sequences is partitioned into sub-jobs on the basis of the number of idle compute nodes available and then processed on these in batches. To achieve this, we have implemented our own Java-based scheduler which distributes sub-jobs across an array of resources utilizing a variety of local job scheduling systems

    Simplified amino acid alphabets based on deviation of conditional probability from random background

    Get PDF
    The primitive data for deducing the Miyazawa-Jernigan contact energy or BLOSUM score matrix consists of pair frequency counts. Each amino acid corresponds to a conditional probability distribution. Based on the deviation of such conditional probability from random background, a scheme for reduction of amino acid alphabet is proposed. It is observed that evident discrepancy exists between reduced alphabets obtained from raw data of the Miyazawa-Jernigan's and BLOSUM's residue pair counts. Taking homologous sequence database SCOP40 as a test set, we detect homology with the obtained coarse-grained substitution matrices. It is verified that the reduced alphabets obtained well preserve information contained in the original 20-letter alphabet.Comment: 9 pages,3figure

    A eubacterial origin for the human tRNA nucleotidyltransferase?

    Get PDF
    tRNA CCA-termini are generated and maintained by tRNA nucleotidyltransferases. Together with poly(A) polymerases and other enzymes they belong to the nucleotidyltransferase superfamily. However, sequence alignments within this family do not allow to distinguish between CCA-adding enzymes and poly(A) polymerases. Furthermore, due to the lack of sequence information about animal CCA-adding enzymes, identification of corresponding animal genes was not possible so far. Therefore, we looked for the human homolog using the baker's yeast tRNA nucleotidyltransferase as a query sequence in a BLAST search. This revealed that the human gene transcript CGI-47, (\#AF151805) deposited in GenBank is likely to encode such an enzyme. To identify the nature of this protein, the cDNA of the transcript was cloned and the recombinant protein biochemically characterized, indicating that CGI-47 encodes a bona fide CCA-adding enzyme and not a poly(A) polymerase. This confirmed animal CCA-adding enzyme allowed us to identify putative homologs from other animals. Calculation of a neighbor-joining tree, using an alignment of several CCA-adding enzymes, revealed that the animal enzymes resemble more eubacterial ones than eukaryotic plant and fungal tRNA nucleotidyltransferases, suggesting that the animal nuclear cca genes might have been derived from the endosymbiotic progenitor of mitochondria and are therefore of eubacterial origin

    Efficient chaining of seeds in ordered trees

    Get PDF
    We consider here the problem of chaining seeds in ordered trees. Seeds are mappings between two trees Q and T and a chain is a subset of non overlapping seeds that is consistent with respect to postfix order and ancestrality. This problem is a natural extension of a similar problem for sequences, and has applications in computational biology, such as mining a database of RNA secondary structures. For the chaining problem with a set of m constant size seeds, we describe an algorithm with complexity O(m2 log(m)) in time and O(m2) in space

    SimSearch: A new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences

    Get PDF
    http://www.informatik.uni-trier.de/%7Eley/db/conf/iwpacbb/iwpacbb2008.htmlIn this paper, we propose SimSearch, an algorithm implementing a new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences. The initial phase of SimSearch is devoted to fulfil the binary similarity matrices by signalling the distances between occurrences of the same symbol. The scoring scheme is further applied, when analysed the maximal extension of the pattern. Employing bit parallelism to analyse the global similarity matrix’s upper triangle, the new methodology searches the sequence(s) for all the exact and approximate patterns in regular or reverse order. The algorithm accepts parameterization to work with greater seeds for near-optimal results. Performance tests show significant efficiency improvement over traditional optimal methods based on dynamic programming. Comparing the new algorithm’s efficiency against heuristic based methods, equalizing the required sensitivity, the proposed algorithm remains acceptable.This work has been partially supported by PRODEP

    Generalized Buneman pruning for inferring the most parsimonious multi-state phylogeny

    Full text link
    Accurate reconstruction of phylogenies remains a key challenge in evolutionary biology. Most biologically plausible formulations of the problem are formally NP-hard, with no known efficient solution. The standard in practice are fast heuristic methods that are empirically known to work very well in general, but can yield results arbitrarily far from optimal. Practical exact methods, which yield exponential worst-case running times but generally much better times in practice, provide an important alternative. We report progress in this direction by introducing a provably optimal method for the weighted multi-state maximum parsimony phylogeny problem. The method is based on generalizing the notion of the Buneman graph, a construction key to efficient exact methods for binary sequences, so as to apply to sequences with arbitrary finite numbers of states with arbitrary state transition weights. We implement an integer linear programming (ILP) method for the multi-state problem using this generalized Buneman graph and demonstrate that the resulting method is able to solve data sets that are intractable by prior exact methods in run times comparable with popular heuristics. Our work provides the first method for provably optimal maximum parsimony phylogeny inference that is practical for multi-state data sets of more than a few characters.Comment: 15 page

    Ty1 insertions in intergenic regions of the genome of Saccharomyces cerevisiae transcribed by RNA polymerase III have no detectable selective effect

    Full text link
    The retrotransposon Ty1 of Saccharomyces cerevisiae inserts preferentially into intergenic regions in the vicinity of RNA polymerase III-transcribed genes. It has been suggested that this preference has evolved to minimize the deleterious effects of element transposition on the host genome, and thus to favor their evolutionary survival. This presupposes that such insertions have no selective effect. However, there has been no direct test of this hypothesis. Here we construct a series of strains containing single Ty1 insertions in the vicinity of tRNA genes, or in the rDNA cluster on chromosome XII, which are otherwise isogenic to strain 337, containing zero Ty1 elements. Competition experiments between 337 and the strains containing single Ty1 insertions revealed that in all cases, the Ty1 insertions have no selective effect in rich medium. These results are thus consistent with the hypothesis that the insertion site preference of Ty1 elements has evolved to minimize the deleterious effects of transposition on the host genome.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/72266/1/S1567-1356_03_00199-5.pd

    Functional characterization of a melon alcohol acyl-transferase gene family involved in the biosynthesis of ester volatiles. Identification of the crucial role of a threonine residue for enzyme activity

    Get PDF
    Volatile esters, a major class of compounds contributing to the aroma of many fruit, are synthesized by alcohol acyl-transferases (AAT). We demonstrate here that, in Charentais melon (Cucumis melo var. cantalupensis), AAT are encoded by a gene family of at least four members with amino acid identity ranging from 84% (Cm-AAT1/Cm-AAT2) and 58% (Cm-AAT1/Cm-AAT3) to only 22% (Cm-AAT1/Cm-AAT4). All encoded proteins, except Cm-AAT2, were enzymatically active upon expression in yeast and show differential substrate preferences. Cm-AAT1 protein produces a wide range of short and long-chain acyl esters but has strong preference for the formation of E-2-hexenyl acetate and hexyl hexanoate. Cm-AAT3 also accepts a wide range of substrates but with very strong preference for producing benzyl acetate. Cm-AAT4 is almost exclusively devoted to the formation of acetates, with strong preference for cinnamoyl acetate. Site directed mutagenesis demonstrated that the failure of Cm-AAT2 to produce volatile esters is related to the presence of a 268-alanine residue instead of threonine as in all active AAT proteins. Mutating 268-A into 268-T of Cm-AAT2 restored enzyme activity, while mutating 268-T into 268-A abolished activity of Cm-AAT1. Activities of all three proteins measured with the prefered substrates sharply increase during fruit ripening. The expression of all Cm-AAT genes is up-regulated during ripening and inhibited in antisense ACC oxidase melons and in fruit treated with the ethylene antagonist 1-methylcyclopropene (1-MCP), indicating a positive regulation by ethylene. The data presented in this work suggest that the multiplicity of AAT genes accounts for the great diversity of esters formed in melon
    corecore